extremist content
XGUARD: A Graded Benchmark for Evaluating Safety Failures of Large Language Models on Extremist Content
Abishethvarman, Vadivel, Chandna, Bhavik, Jalan, Pratik, Naseem, Usman
Large Language Models (LLMs) can generate content spanning ideological rhetoric to explicit instructions for violence. However, existing safety evaluations often rely on simplistic binary labels (safe and unsafe), overlooking the nuanced spectrum of risk these outputs pose. To address this, we present XGUARD, a benchmark and evaluation framework designed to assess the severity of extremist content generated by LLMs. XGUARD includes 3,840 red teaming prompts sourced from real world data such as social media and news, covering a broad range of ideologically charged scenarios. Our framework categorizes model responses into five danger levels (0 to 4), enabling a more nuanced analysis of both the frequency and severity of failures. We introduce the interpretable Attack Severity Curve (ASC) to visualize vulnerabilities and compare defense mechanisms across threat intensities. Using XGUARD, we evaluate six popular LLMs and two lightweight defense strategies, revealing key insights into current safety gaps and trade-offs between robustness and expressive freedom. Our work underscores the value of graded safety metrics for building trustworthy LLMs.
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > Sri Lanka (0.04)
- Media (1.00)
- Government (1.00)
- Law Enforcement & Public Safety > Terrorism (0.99)
- (2 more...)
ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content
Chandna, Bhavik, Aboujenane, Mariam, Naseem, Usman
Large Multimodal Models (LMMs) are increasingly vulnerable to AI-generated extremist content, including photorealistic images and text, which can be used to bypass safety mechanisms and generate harmful outputs. However, existing datasets for evaluating LMM robustness offer limited exploration of extremist content, often lacking AI-generated images, diverse image generation models, and comprehensive coverage of historical events, which hinders a complete assessment of model vulnerabilities. To fill this gap, we introduce ExtremeAIGC, a benchmark dataset and evaluation framework designed to assess LMM vulnerabilities against such content. ExtremeAIGC simulates real-world events and malicious use cases by curating diverse text- and image-based examples crafted using state-of-the-art image generation techniques. Our study reveals alarming weaknesses in LMMs, demonstrating that even cutting-edge safety measures fail to prevent the generation of extremist material. We systematically quantify the success rates of various attack strategies, exposing critical gaps in current defenses and emphasizing the need for more robust mitigation strategies.
- Asia > Russia (0.14)
- Asia > Middle East > Syria (0.14)
- Asia > Middle East > Iran (0.14)
- (14 more...)
- Media (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government > Military (1.00)
Australia's spy chief warns AI will accelerate online radicalisation
The head of Australia's peak intelligence agency has warned that people like the Christchurch terrorist are being radicalised on social media, and artificial intelligence is likely to make it much worse. The director general of the Australian Security Intelligence Organisation (Asio), Mike Burgess, told a social media summit in Adelaide on Friday that social media is "both a goldmine and a cesspit" that creates communities and divides them, and the internet was "the world's most potent incubator of extremism". He said people were embracing anti-authority ideologies, conspiracy theories and diverse grievances, and while social media was not the sole driver, he said Asio considered it a "significant driver". "Social media allows extremist ideologies, conspiracies, dis- and misinformation to be shared at an unprecedented scale and speed," he said. He said radicalisation can now take days and weeks rather than months and years as it previously did, with the most likely perpetrator of a terrorist attack being a lone actor.
- Oceania > Australia (0.95)
- Oceania > New Zealand > South Island > Canterbury > Christchurch (0.05)
- Europe > France (0.05)
Assessing Large Language Models for Online Extremism Research: Identification, Explanation, and New Knowledge
Dong, Beidi, Lee, Jin R., Zhu, Ziwei, Srinivasan, Balassubramanian
The United States has experienced a significant increase in violent extremism, prompting the need for automated tools to detect and limit the spread of extremist ideology online. This study evaluates the performance of Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-Trained Transformers (GPT) in detecting and classifying online domestic extremist posts. We collected social media posts containing "far-right" and "far-left" ideological keywords and manually labeled them as extremist or non-extremist. Extremist posts were further classified into one or more of five contributing elements of extremism based on a working definitional framework. The BERT model's performance was evaluated based on training data size and knowledge transfer between categories. We also compared the performance of GPT 3.5 and GPT 4 models using different prompts: na\"ive, layperson-definition, role-playing, and professional-definition. Results showed that the best performing GPT models outperformed the best performing BERT models, with more detailed prompts generally yielding better results. However, overly complex prompts may impair performance. Different versions of GPT have unique sensitives to what they consider extremist. GPT 3.5 performed better at classifying far-left extremist posts, while GPT 4 performed better at classifying far-right extremist posts. Large language models, represented by GPT models, hold significant potential for online extremism classification tasks, surpassing traditional BERT models in a zero-shot setting. Future research should explore human-computer interactions in optimizing GPT models for extremist detection and classification tasks to develop more efficient (e.g., quicker, less effort) and effective (e.g., fewer errors or mistakes) methods for identifying extremist content.
- Europe > Germany (0.14)
- North America > United States > Virginia > Fairfax County > Fairfax (0.04)
- North America > United States > Washington > King County > Bellevue (0.04)
- (8 more...)
- Media (1.00)
- Law > Civil Rights & Constitutional Law (1.00)
- Law Enforcement & Public Safety > Terrorism (1.00)
- (7 more...)
Halting the Flow of Terrorist Propaganda with AI
Extremist groups like ISIS have used the internet to propagate their ideologies and recruit individuals for more than a decade. Social media played an important role in the rise of ISIS [1], and increased terrorist activity online is described by the United Nations Office of Counter Terrorism as practically synonymous with modern terrorism [2]. The amount of ISIS-related content is staggering; hundreds of millions of pieces of extremist information are posted on the internet every year. A major problem is identifying ISIS propaganda in the first place; the group evades detection in numerous ways, including mixing their material with content from legitimate news outlets, blurring ISIS branding, and hijacking Facebook accounts [3]. When you combine the evasive tactics with highly heterogeneous and dynamic online environments, traditional content analysis fails to properly characterize which online material is extremist in origin and which is not.
- Asia > Middle East > Syria (0.07)
- Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.05)
- Asia > Middle East > Iraq (0.05)
Facebook trains artificial intelligence on 'hateful memes'
Facebook unveiled an initiative Tuesday to take on "hateful memes" by using artificial intelligence, backed by crowd sourcing, to identify maliciously motivated posts. The leading social network said it had already created a database of 10,000 memes -- images often blended with text to deliver a specific message -- as part of a ramped-up effort against hate speech. Facebook said it was releasing the database to researchers as part of a "hateful memes challenge" to develop improved algorithms to detect hate-driven visual messages, with a prize pool of $100,000. "These efforts will spur the broader AI research community to test new methods, compare their work, and benchmark their results in order to accelerate work on detecting multimodal hate speech," Facebook said in a blog post. Facebook's effort comes as it leans more heavily on AI to filter out objectionable content during the coronavirus pandemic that has sidelined most of its human moderators.
- Oceania > New Zealand (0.06)
- North America > United States > Kansas (0.06)
Artificial intelligence can't solve online extremism issue, experts tell House panel
A group of experts on Tuesday warned a House panel that artificial intelligence is not capable of sweeping up the full breadth of online extremist content -- in particular posts from white supremacists. At a House Homeland Security subcommittee hearing, lawmakers cast doubt on claims from top tech companies that artificial intelligence, or AI, will one day be able to detect and take down terrorist and extremist content without any human moderation. Rep. Max RoseMax RoseCongress needs to continue fighting the opioid epidemic Hillicon Valley: Investigation finds federal agencies failed to address cyber vulnerabilities Officials crack down on illegal robocallers Warren offers plan to secure elections Senators grill Google exec on'persuasive technology' Artificial intelligence can't solve online extremism issue, experts tell House panel MORE (D-N.Y.), the chairman of counterterrorism subcommittee holding the hearing, said he is fed up with responses from companies like Google, Twitter and Facebook about their failure to take down extremist posts and profiles, calling it "wanton disregard for national security obligations." "We are hearing the same thing from social media companies, and that is, 'AI's got this, it's only gonna get better,' " Rose said during his opening remarks. "Nonetheless ... we have seen egregious problems."
- North America > United States (0.56)
- Oceania > New Zealand > South Island > Canterbury Region > Christchurch (0.06)
Big Tech is overselling AI as the solution to online extremism
In mid-September the European Union threatened to fine the Big Tech companies if they did not remove terrorist content within one hour of appearing online. The change came because rising tensions are now developing and being played out on social media platforms. Social conflicts that once built up in backroom meetings and came to a head on city streets, are now building momentum on social media platforms before spilling over into real life. In the past, governments tended to control traditional media, with little to no possibility for individuals to broadcast hate. The digital revolution has altered everything.
- North America > United States (0.49)
- Asia > Middle East > Syria (0.16)
- North America > Canada (0.07)
- (5 more...)
- Media (1.00)
- Information Technology (1.00)
- Government > Regional Government (1.00)
- Law Enforcement & Public Safety > Terrorism (0.75)
New AI technology used by UK government to fight extremist content
The UK Home Office on Monday unveiled a £600,000 artificial intelligence (AI) tool to automatically detect terrorist content. The Home Office cited tests that show the new tool can automatically detect 94% of Daesh propaganda with 99.995% accuracy. That accuracy rate translates into only 50 out of one million randomly selected videos that would require human review. The tool can run on any platform and can integrate into the video upload process to stop most extremist content before it ever reaches the internet. The tool was developed by the Home Office and ASI Data Science.
- Europe > United Kingdom (1.00)
- North America > United States > California (0.08)
Home Office unveils AI program to tackle Isis online propaganda
Tool can detect 94% of Isis propaganda with a 99.99% success rate in tests An artificial intelligence program that can detect Islamic State propaganda online with a 94% success rate has been developed, the Home Office has announced. The technology could stop the majority of Isis videos from reaching the internet by analysing the audio and images of a video file during the uploading process, and rejecting extremist content. The tool, which has been developed in partnership by the Home Office and ASI Data Science, will be made available to all internet platforms, although many major tech companies such as Facebook and Twitter already use similar technology on their own websites. The tool is aimed at tackling extremist content on smaller platforms like Vimeo, Telegraph and pCloud, which have seen a large rise in Isis propaganda. The terror group has used 400 different websites to upload their content last year, research has found.
- Europe > United Kingdom (0.68)
- North America > United States > California > San Francisco County > San Francisco (0.06)
- Media (1.00)
- Law Enforcement & Public Safety > Terrorism (0.80)
- Government > Regional Government > Europe Government > United Kingdom Government (0.68)